Towards Basic Categories for Describing Properties of Texts in a Corpus
نویسنده
چکیده
The paper discusses the basic principles for describing properties of texts to be stored in a corpus and suggests the standard that is used in the majority of corpora developed at the University of Leeds and can be potentially employed for describing texts in any corpus collecting activity. The standard defines the minimal subset of tags and attributes that are necessary for describing texts stored in a corpus. The proposed text typology helps to position a corpus under development with respect to a reference corpus covering all possible features by explicit selection of a subset of features to be considered in the study.
منابع مشابه
Using Web Corpus Statistics to Infer Conceptual Structure
The basic level is the level of conceptual structure at which categories are maximally informative. In this research, we investigated whether the privileged status of the basic level might be captured by the statistical properties of the Web. Using Google’s Web search programming interface, we found that frequency ratios for terms across three levels of abstraction (superordinate, basic, and su...
متن کاملTowards a reference corpus of web genres
Genres of spoken and written texts are being intensively studied from various angles, e.g., communication studies, discourse analysis, computational linguistics, without arriving at a generally accepted definition. Many corpora have been built to represent the language, but very few large corpora indicate genres, and when they do the typology of genres varies widely. For instance, the Brown cor...
متن کاملVocabulary Lists for EAP and Conversation Students
Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...
متن کاملMove-based investigation of appraisal in the introduction section of Applied Linguistics research articles: Similarities and differences between L1 and L2 English texts
Recent research has shown that academic writing is not ‘author-evacuated’ but, rather, carries a representation of the writers’ identity. One way through which writers project their identity in academic writing is stance-taking toward propositions advanced in the text. Appropriate stance-taking has proved to be challenging for novice writers of Research Articles (RAs), especially those writing ...
متن کاملThe System of Engagement in a Sample of Prose Fiction and the News
Emerging within Systemic Linguistics, Appraisal/Evaluation is a framework for analyzing the language of evaluation, providing techniques for the systematic analysis of evaluation and stance as they operate in whole texts and in groupings of texts. There are three systems in the Appraisal framework: Attitude, Engagement, and Graduation. This study sets out to analyze the use of the system of Eng...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004